Análisis multivariante

PCA y NMDS

  • Introducción

  • Análisis de componentes principales (PCA)

  • Escalamiento multidimensional no métrico (NMDS)

Introducción

Método estadístico que analiza simultáneamente múltiples características en cada muestra objeto de estudio.

Algunos ejemplos:

  • Análisis de componentes principales

  • Análisis de correspondencias

  • Escalamiento multidimensional

  • Análisis de correspondencias canónico

  • Etc.

Análisis de componentes principales (PCA)

El objetivo es reducir un gran número de variables perdiendo la menor cantidad de información posible.

Los nuevos componentes principales o factores serán una combinación lineal de las variables originales e independientes entre sí.

La interpretación de los factores se deducirá observando su relación con las variables iniciales.

Fases:

  1. Análisis de la matriz de correlaciones. Deben existir altas correlaciones entre las variables.

  2. Selección de los factores. El primer factor recoje la mayor proporción de variabilidad original, el segundo, la máxima no recogida por el primero, etc.

  3. Análisis de la matriz factorial. Se representan los factores seleccionados en forma de matriz.

  4. Interpretación de los factores

Ejemplo modelización riqueza plantas exóticas a partir del clima

Modelar riqueza de especies exóticas utilizando variables climáticas.

clima <- read.table("exoticas.txt", header = T, sep = "\t")
str(clima)
'data.frame':   2243 obs. of  13 variables:
 $ Alien               : int  23 32 25 46 35 89 38 46 40 4 ...
 $ Mean.Temperature    : num  6.86 7.39 5.3 7.71 7.39 ...
 $ Mean.Jan.Temperature: num  3.27 3.46 2.29 3.31 2.91 ...
 $ Rango.de.temperatura: num  4.84 6 3.98 6.46 6.53 ...
 $ PET                 : num  518 600 592 607 601 ...
 $ Min.pET             : num  8.44 13.89 12.98 12.7 11.82 ...
 $ Max.pET             : num  89.9 101.8 101.5 105.4 105.5 ...
 $ Insolation          : num  2.79 2.8 3.04 3.28 3.2 ...
 $ Growth.Season       : num  282 291 205 275 263 ...
 $ AET                 : num  459 484 434 459 451 ...
 $ Water.Defcit        : num  58.4 115.6 158 148.8 150.4 ...
 $ Precipitation       : num  1392 1605 855 959 958 ...
 $ Rainfall            : num  1392 1605 855 959 958 ...

Alien : variable respuesta

library(corrgram)
corrgram(clima[,-1], lower.panel=panel.pts, upper.panel=panel.conf, diag.panel=panel.density)

PCA para reducir la dimensionalidad y utilizar factores principales para modelar la riqueza de especies exóticas

pca_1 <- prcomp(clima[,-1], scale = T)
summary(pca_1)
Importance of components:
                          PC1    PC2     PC3     PC4     PC5     PC6     PC7
Standard deviation     2.6437 1.7183 0.98154 0.71722 0.47717 0.39998 0.35721
Proportion of Variance 0.5824 0.2461 0.08028 0.04287 0.01897 0.01333 0.01063
Cumulative Proportion  0.5824 0.8285 0.90876 0.95162 0.97060 0.98393 0.99456
                           PC8     PC9    PC10    PC11     PC12
Standard deviation     0.20319 0.12922 0.08389 0.01386 0.007336
Proportion of Variance 0.00344 0.00139 0.00059 0.00002 0.000000
Cumulative Proportion  0.99800 0.99939 0.99998 1.00000 1.000000
pca_1$rotation[,1:2]
                             PC1         PC2
Mean.Temperature      0.34852153 -0.16985773
Mean.Jan.Temperature  0.30722684 -0.31362840
Rango.de.temperatura  0.21576733  0.13711343
PET                   0.35433070 -0.09847938
Min.pET               0.27654607 -0.28400149
Max.pET               0.31976844  0.21453683
Insolation            0.33246966 -0.05442222
Growth.Season         0.32063663 -0.26539819
AET                  -0.01362093 -0.54991427
Water.Defcit          0.23318824  0.40121923
Precipitation        -0.28774698 -0.30042944
Rainfall             -0.28741001 -0.30094185
library(factoextra)
fviz_pca_var(pca_1, col.var = "black")
Ajustar modelo estadístico riqueza especies exóticas frente a los dos ejes principales
clima$pca1 <- pca_1$x[,1]
clima$pca2 <- pca_1$x[, 2]
glm_exoticas1 <- glm(Alien ~ pca1 + pca2, clima, family = "poisson")
anova(glm_exoticas1)
Analysis of Deviance Table

Model: poisson, link: log

Response: Alien

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                  2242     153463              
pca1  1   106723      2241      46740 < 2.2e-16 ***
pca2  1      867      2240      45873 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Supuestos del modelo

par(mfcol=c(2,2))
plot(glm_exoticas1)
library(aods3)
gof(glm_exoticas1)
D  = 45873.44, df = 2240, P(>D) = 0
X2 = 47198.34, df = 2240, P(>X2) = 0

Modelo binomial negativo

library(MASS)
glm_exoticas2 <- glm.nb(Alien ~ pca1 + pca2, clima)
par(mfcol=c(2,2))
plot(glm_exoticas2)
gof(glm_exoticas2)
D  = 2407.635, df = 2240, P(>D) = 0.007053173
X2 = 2257.158, df = 2240, P(>X2) = 0.3952556
anova(glm_exoticas2)
Analysis of Deviance Table

Model: Negative Binomial(5.4769), link: log

Response: Alien

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                  2242     7065.4              
pca1  1   4645.5      2241     2419.8 < 2.2e-16 ***
pca2  1     12.2      2240     2407.6 0.0004822 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(visreg)
par(mfcol=c(1,2))
visreg(glm_exoticas2, scale="response", ylab="Riqueza de especies")

Escalamiento multidimensional no métrico (NMDS)

Representación de las distancias existentes entre un conjunto de datos en un espacio geométrico de pocas dimensiones.

1. Cálculo matriz de disimilaridad X a partir de la matriz de datos.

2. Asignación de unidades muestrales a una configuración inicial aleatoria

3. Cálculo distancias otra vez en este nuevo espacio geométrico y se calcula una matriz de distancia Y

4. Se comparan las matrices de distancia X e Y y se miden cómo son de parecidas entre ellas (stress)

5. A partir de la configuración inicial, se reasignan las unidades muestrales para reducir las distancia con la matriz X

6. Repetición iterativa del proceso hasta que se consigue una solución óptima (se minimiza el stress)

Gradiente de composición florística en bosques tropicales montanos

Investigar qué variables ambientales afectan a la composición florística de árboles en distintos tipos de bosques tropicales. Abundancia de cada especie en cada parcela.

1. Explorar visualmente cómo son de similares las parcelas muestreadas en función de la composición de especies

2. Investigar relación de esta ordenación y las variables ambientales.

bio <- read.table("MANOVA-bio.txt", header=T, sep="\t")
env <- read.table("MANOVA-env.txt", header=T, sep="\t")
library(vegan)
set.seed(0)
nmds_1 <- metaMDS(bio, distance = "bray", try = 200, trymax = 200)
Square root transformation
Wisconsin double standardization
Run 0 stress 0.1813191 
Run 1 stress 0.189724 
Run 2 stress 0.1914815 
Run 3 stress 0.1827766 
Run 4 stress 0.1888635 
Run 5 stress 0.1762361 
... New best solution
... Procrustes: rmse 0.01048257  max resid 0.1464194 
Run 6 stress 0.1927798 
Run 7 stress 0.1969669 
Run 8 stress 0.1814796 
Run 9 stress 0.1762361 
... New best solution
... Procrustes: rmse 0.000678957  max resid 0.008967172 
... Similar to previous best
Run 10 stress 0.176237 
... Procrustes: rmse 0.0008545313  max resid 0.01134997 
Run 11 stress 0.2003403 
Run 12 stress 0.1867425 
Run 13 stress 0.1871088 
Run 14 stress 0.1810092 
Run 15 stress 0.1876214 
Run 16 stress 0.1806629 
Run 17 stress 0.1930107 
Run 18 stress 0.1762369 
... Procrustes: rmse 0.0002173221  max resid 0.002312363 
... Similar to previous best
Run 19 stress 0.1841313 
Run 20 stress 0.1762357 
... New best solution
... Procrustes: rmse 8.135117e-05  max resid 0.0010794 
... Similar to previous best
Run 21 stress 0.1900143 
Run 22 stress 0.1868287 
Run 23 stress 0.1907046 
Run 24 stress 0.186743 
Run 25 stress 0.1762374 
... Procrustes: rmse 0.0002461696  max resid 0.002268407 
... Similar to previous best
Run 26 stress 0.1969186 
Run 27 stress 0.1762361 
... Procrustes: rmse 0.000557169  max resid 0.007301201 
... Similar to previous best
Run 28 stress 0.1867422 
Run 29 stress 0.1797416 
Run 30 stress 0.1867433 
Run 31 stress 0.1784679 
Run 32 stress 0.1907052 
Run 33 stress 0.1828305 
Run 34 stress 0.1868239 
Run 35 stress 0.1900136 
Run 36 stress 0.1784677 
Run 37 stress 0.1881263 
Run 38 stress 0.1900146 
Run 39 stress 0.1812821 
Run 40 stress 0.18148 
Run 41 stress 0.1762366 
... Procrustes: rmse 0.0007213377  max resid 0.009669698 
... Similar to previous best
Run 42 stress 0.2091809 
Run 43 stress 0.1841312 
Run 44 stress 0.1812541 
Run 45 stress 0.1806618 
Run 46 stress 0.1846137 
Run 47 stress 0.1925728 
Run 48 stress 0.2017255 
Run 49 stress 0.1868236 
Run 50 stress 0.1793479 
Run 51 stress 0.1762362 
... Procrustes: rmse 0.0006659768  max resid 0.00892909 
... Similar to previous best
Run 52 stress 0.2041754 
Run 53 stress 0.176236 
... Procrustes: rmse 0.0005829137  max resid 0.007660708 
... Similar to previous best
Run 54 stress 0.1762371 
... Procrustes: rmse 0.0001766661  max resid 0.001593336 
... Similar to previous best
Run 55 stress 0.201832 
Run 56 stress 0.188662 
Run 57 stress 0.1762368 
... Procrustes: rmse 0.0005127457  max resid 0.00592756 
... Similar to previous best
Run 58 stress 0.1868218 
Run 59 stress 0.1762354 
... New best solution
... Procrustes: rmse 0.0003546211  max resid 0.004754731 
... Similar to previous best
Run 60 stress 0.1793475 
Run 61 stress 0.1961999 
Run 62 stress 0.1841311 
Run 63 stress 0.1905914 
Run 64 stress 0.1955569 
Run 65 stress 0.1762375 
... Procrustes: rmse 0.0005533177  max resid 0.007094434 
... Similar to previous best
Run 66 stress 0.1873162 
Run 67 stress 0.1762353 
... New best solution
... Procrustes: rmse 6.371811e-05  max resid 0.0008595397 
... Similar to previous best
Run 68 stress 0.1841318 
Run 69 stress 0.1762367 
... Procrustes: rmse 0.0002742295  max resid 0.002257947 
... Similar to previous best
Run 70 stress 0.1809903 
Run 71 stress 0.1806618 
Run 72 stress 0.2070922 
Run 73 stress 0.1848277 
Run 74 stress 0.1762355 
... Procrustes: rmse 0.0002205998  max resid 0.002960223 
... Similar to previous best
Run 75 stress 0.1940907 
Run 76 stress 0.1793489 
Run 77 stress 0.1762366 
... Procrustes: rmse 0.0003673795  max resid 0.004916036 
... Similar to previous best
Run 78 stress 0.1846118 
Run 79 stress 0.1811326 
Run 80 stress 0.1762382 
... Procrustes: rmse 0.0003724917  max resid 0.002683883 
... Similar to previous best
Run 81 stress 0.1762357 
... Procrustes: rmse 0.0003326045  max resid 0.004462617 
... Similar to previous best
Run 82 stress 0.1762363 
... Procrustes: rmse 0.0003420567  max resid 0.004310418 
... Similar to previous best
Run 83 stress 0.1797412 
Run 84 stress 0.1762378 
... Procrustes: rmse 0.0004579141  max resid 0.005689723 
... Similar to previous best
Run 85 stress 0.1827383 
Run 86 stress 0.1806621 
Run 87 stress 0.1868278 
Run 88 stress 0.1762366 
... Procrustes: rmse 0.0003862705  max resid 0.004936148 
... Similar to previous best
Run 89 stress 0.1788668 
Run 90 stress 0.1952555 
Run 91 stress 0.1806617 
Run 92 stress 0.194699 
Run 93 stress 0.176236 
... Procrustes: rmse 0.0002694521  max resid 0.003259305 
... Similar to previous best
Run 94 stress 0.1882663 
Run 95 stress 0.1784668 
Run 96 stress 0.1861735 
Run 97 stress 0.1784682 
Run 98 stress 0.184131 
Run 99 stress 0.184829 
Run 100 stress 0.1827756 
Run 101 stress 0.1841308 
Run 102 stress 0.1861665 
Run 103 stress 0.1827755 
Run 104 stress 0.1762367 
... Procrustes: rmse 0.0003568748  max resid 0.004308465 
... Similar to previous best
Run 105 stress 0.2006077 
Run 106 stress 0.1814799 
Run 107 stress 0.1784683 
Run 108 stress 0.1793487 
Run 109 stress 0.1784685 
Run 110 stress 0.1762358 
... Procrustes: rmse 0.0002237247  max resid 0.002995317 
... Similar to previous best
Run 111 stress 0.1784679 
Run 112 stress 0.1868273 
Run 113 stress 0.1881279 
Run 114 stress 0.2025052 
Run 115 stress 0.2076421 
Run 116 stress 0.2014525 
Run 117 stress 0.20537 
Run 118 stress 0.1815359 
Run 119 stress 0.1841231 
Run 120 stress 0.1868274 
Run 121 stress 0.1861735 
Run 122 stress 0.1876213 
Run 123 stress 0.1868278 
Run 124 stress 0.1814789 
Run 125 stress 0.1841231 
Run 126 stress 0.1784673 
Run 127 stress 0.1856853 
Run 128 stress 0.1784671 
Run 129 stress 0.176237 
... Procrustes: rmse 0.0004403699  max resid 0.005373281 
... Similar to previous best
Run 130 stress 0.2017536 
Run 131 stress 0.1812534 
Run 132 stress 0.1881266 
Run 133 stress 0.1813021 
Run 134 stress 0.178467 
Run 135 stress 0.1762371 
... Procrustes: rmse 0.0004431638  max resid 0.004958917 
... Similar to previous best
Run 136 stress 0.1980442 
Run 137 stress 0.1784665 
Run 138 stress 0.1867407 
Run 139 stress 0.1784666 
Run 140 stress 0.1793485 
Run 141 stress 0.1881263 
Run 142 stress 0.1857336 
Run 143 stress 0.2084805 
Run 144 stress 0.1762355 
... Procrustes: rmse 0.0001216775  max resid 0.001623193 
... Similar to previous best
Run 145 stress 0.1793475 
Run 146 stress 0.1762368 
... Procrustes: rmse 0.0004021772  max resid 0.005153592 
... Similar to previous best
Run 147 stress 0.1827755 
Run 148 stress 0.1891872 
Run 149 stress 0.1827755 
Run 150 stress 0.1762373 
... Procrustes: rmse 0.000497628  max resid 0.006287264 
... Similar to previous best
Run 151 stress 0.1867423 
Run 152 stress 0.1841322 
Run 153 stress 0.1814788 
Run 154 stress 0.1846855 
Run 155 stress 0.1793477 
Run 156 stress 0.1784684 
Run 157 stress 0.1788671 
Run 158 stress 0.1814783 
Run 159 stress 0.1762377 
... Procrustes: rmse 0.000558887  max resid 0.007072896 
... Similar to previous best
Run 160 stress 0.1908243 
Run 161 stress 0.1794581 
Run 162 stress 0.1793476 
Run 163 stress 0.1784678 
Run 164 stress 0.1841305 
Run 165 stress 0.1762359 
... Procrustes: rmse 0.0002694401  max resid 0.003608414 
... Similar to previous best
Run 166 stress 0.1762358 
... Procrustes: rmse 0.0003455068  max resid 0.004633095 
... Similar to previous best
Run 167 stress 0.1938621 
Run 168 stress 0.2059901 
Run 169 stress 0.201116 
Run 170 stress 0.1788661 
Run 171 stress 0.181282 
Run 172 stress 0.1827772 
Run 173 stress 0.1841227 
Run 174 stress 0.1924749 
Run 175 stress 0.1805974 
Run 176 stress 0.1762356 
... Procrustes: rmse 0.000280365  max resid 0.003762712 
... Similar to previous best
Run 177 stress 0.1762367 
... Procrustes: rmse 0.000397959  max resid 0.005328579 
... Similar to previous best
Run 178 stress 0.1762363 
... Procrustes: rmse 0.0003092645  max resid 0.003841927 
... Similar to previous best
Run 179 stress 0.1867429 
Run 180 stress 0.1762356 
... Procrustes: rmse 0.0001598903  max resid 0.001479898 
... Similar to previous best
Run 181 stress 0.1876923 
Run 182 stress 0.1827756 
Run 183 stress 0.1762355 
... Procrustes: rmse 0.0001304249  max resid 0.001747588 
... Similar to previous best
Run 184 stress 0.1762369 
... Procrustes: rmse 0.0004370817  max resid 0.00564469 
... Similar to previous best
Run 185 stress 0.2036807 
Run 186 stress 0.1841236 
Run 187 stress 0.1821859 
Run 188 stress 0.1762366 
... Procrustes: rmse 0.0003872861  max resid 0.00494709 
... Similar to previous best
Run 189 stress 0.1762364 
... Procrustes: rmse 0.0003462769  max resid 0.004370201 
... Similar to previous best
Run 190 stress 0.1881265 
Run 191 stress 0.1762381 
... Procrustes: rmse 0.0003599562  max resid 0.002512741 
... Similar to previous best
Run 192 stress 0.2003486 
Run 193 stress 0.1762357 
... Procrustes: rmse 0.0003177059  max resid 0.004263728 
... Similar to previous best
Run 194 stress 0.2015027 
Run 195 stress 0.1861732 
Run 196 stress 0.1881268 
Run 197 stress 0.2029969 
Run 198 stress 0.1793489 
Run 199 stress 0.1762372 
... Procrustes: rmse 0.0003698368  max resid 0.004101882 
... Similar to previous best
Run 200 stress 0.1876549 
*** Best solution repeated 31 times
plot(nmds_1)

Gráfico con diferencias por tipo de bosque y relación lineal con variables

coordenadas <- as.data.frame(scores(nmds_1)$sites)
coordenadas$Forest = env$Forest
envfit_res <- envfit(nmds_1, env, permutations = 999)
env_vectors <- as.data.frame(scores(envfit_res, display = "vectors"))
env_vectors$Variable <- rownames(env_vectors)

ggplot(coordenadas, aes(x = NMDS1, y = NMDS2))+ 
  geom_point(size = 4, aes( shape = Forest, colour = Forest)) +
# Vectores ambientales (flechas)
  geom_segment(data = env_vectors,
               aes(x = 0, xend = NMDS1, y = 0, yend = NMDS2),
               arrow = arrow(length = unit(0.3, "cm")),
               color = "black") +
  # Nombres de las variables
  geom_text(data = env_vectors,
            aes(x = NMDS1 * 1.1, y = NMDS2 * 1.1, label = Variable),
            color = "black",
            size = 4) +
    theme_minimal()